A novel variable neighborhood search approach for cell clustering for spatial transcriptomics

This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.

In order to contribute to this dynamic landscape, we introduce a novel methodology rooted in the Variable Neighborhood Search approach [7].Our innovation seeks to elevate the precision and efficacy of cell clustering in spatial transcriptomic analyses, promising to reveal hidden facets of cellular organization and functionality.In this work, we introduce a novel Variable Neighborhood Search (VNS) approach tailored for cell clustering in spatial transcriptomics.Although our initial investigations focused on datasets designed for cell-type clustering, it is essential to emphasize that our method's design accommodates spatial domain clustering as well.Here, we present a synthesis of computational skills and biological insights aimed at pushing the boundaries of our understanding of the complex cell interactions within tissues.

BACKGROUND Clustering methods from the literature
Many methods in the literature can be used to partition an N-dimensional population into K sets based on specific rules.In this paper, we focus on some of the most popular clustering methods used in the field of data analysis, such as k-Means [8], Louvain [9], Leiden [10], and MClust [11].While these methods share the goal of grouping data points, they differ in the types of data they are designed for, the principle they optimize, and the algorithms they are well-suited for.k-Means is a general-purpose clustering algorithm, Louvain and Leiden are tailored for community detection in networks, while MClust is a model-based clustering method.In the following subsections, we briefly describe each of these methods.

k-Means algorithm
The k-means algorithm [8] is a partitioning algorithm that divides a dataset into k-clusters based on the similarity of data points.It starts by establishing k groups, each comprising a singular randomly chosen point.Points are then added to these groups according to the principle that new points are assigned to the group whose mean point is the most similar by some rule.After point allocation, the means of all groups are adjusted to incorporate the influence of newly added points.Consequently, at each stage, the k-means are reflective of the means of the groups they represent.
While this method is computationally efficient and adeptly handles extensive datasets, it does not guarantee convergence to an optimal solution.Notably, issues arise from the random initialization of centroids, leading to unexpected convergence patterns.Moreover, the algorithm requires users to choose the cluster number beforehand, influencing cluster shapes and susceptibility to outlier effects.However, it is known that certain special cases of the k-means algorithm exist in the literature where convergence to an optimal solution is assured.

Louvain algorithm
The Louvain algorithm, developed by Vondel et al. [9], is designed for detecting communities in network or graph data.This algorithm aims to optimize modularity, a measure of the quality of network division into communities, using two phases: (1) local moving of nodes and (2) aggregation of the network.In the first phase, individual nodes are moved to the community that yields the largest increase in the quality function.In the second phase, an aggregation network is obtained based on partitions, with each community in a partition becoming a node in the aggregate network.These two phases are repeated until the quality function cannot be increased further.However, the Louvain algorithm can potentially produce communities with arbitrarily poor connectivity.In the most adverse scenarios, these communities may become entirely disconnected, particularly during iterative executions of the algorithm.

Leiden algorithm
To address the connectivity issues of the Louvain algorithm, Traag et al. introduced the Leiden algorithm [10].The Leiden algorithm guarantees that communities are well connected and, when applied iteratively, the algorithm converges to a partition where all subsets of all communities are locally optimally assigned.Leveraging the smart local move algorithm, an enhancement of the Louvain algorithm, and incorporating the concept of speeding up local node movements and moving nodes to random neighbours, the Leiden algorithm identifies the most promising directions for Louvain algorithm improvement.
The Leiden algorithm consists of three phases: (1) local moving of nodes, (2) refinement of the partition, and (3) aggregation of the network based on the refined partition, using the non-refined partition to create an initial partition for the aggregate network.Thus, this algorithm optimizes a quality function to identify communities by considering the density of connections within the communities.

MClust
MClust [11], applied in cell clustering, identifies distinct cell groups based on observed features using Gaussian mixture models [12].Unlike other clustering algorithms, MClust accommodates various cluster shapes, making it suitable for complex situations.It utilizes the Expectation-Maximization [13] algorithm for parameter estimation, offering robust handling of missing data and complex distributions.This model-based clustering tool is powerful in uncovering patterns within complex biological datasets, such as those from single-cell omics technologies.Initially designed for single-cell RNA sequencing data, it can also be applied to spatial transcriptomic data, its effectiveness depending on data characteristics and analysis goals.

Embedding methods from the literature
In spatial transcriptomics, where data is organized as a matrix with cells and genes, the high dimensionality (often exceeding 30,000 genes) and sparsity pose analytical challenges.
Dimensionality reduction methods play key roles in addressing these issues.These techniques help distill meaningful patterns from the data, facilitating more efficient analyses.
The generation of embeddings, achieved through established literature methods, aims to transform the high-dimensional gene space into a more manageable form.This process enables a clearer exploration of spatial relationships, cell heterogeneity, and underlying biological processes.By leveraging validated methods from existing literature, we ensure a scientifically rigorous approach, condensing rich gene expression profiles into interpretable embeddings while addressing computational complexities.

STAGATE
The STAGATE method [14] has been designed for spatial clustering and denoising in spatially resolved transcriptomics data.This method generates low-dimensional latent embeddings with both spatial information and gene expressions via a graph attention auto-encoder.Notably, the method adopts an attention mechanism in the middle layer of the encoder and decoder, which learns the edge weights of spatial neighbor networks and uses them to update spot representations by collectively aggregating information from their neighbors.

Principal Component Analysis
PCA [15] is a statistical method for dimensionality reduction and data visualization.It is a mathematical procedure that transforms a set of correlated variables into a new set of uncorrelated variables known as principal components.The principal components are linear combinations of the original variables and are sorted based on how much they account for the variance within the data; i.e., the first principal component accounts for the highest variance.PCA finds widespread application across domains, including data analysis, machine learning, and image processing, aiming to streamline intricate datasets and uncover patterns or associations between variables.

GraphST
GraphST [16] is an advanced self-supervised contrastive learning technique designed to maximize the potential of spatial transcriptomics data.Integrating graph neural networks with self-supervised contrastive learning, this method acquires spot representations that are both informative and distinctive.This is achieved by minimizing the embedding distance between spatially neighboring spots reciprocally.

Cell Clustering for Spatial Transcriptomics data
CCST [17] leverages graph convolutional networks (GCNs) to integrate gene expression data and comprehensive spatial information from individual cells in spatial gene expression data.The relationships between variables are captured as a graph, with the adjacency matrix representing connections among variables and the node feature matrix reflecting variable observations.The GCN layer is strategically designed to fuse graph (in our case, spatial structure) and node features (gene expression).Initially, the data is transformed into a graph, where nodes represent cells with gene expression profiles as attributes, and edges represent neighborhood relationships between cells.Subsequently, a sequence of GCN layers is used to incorporate graph and gene expression details into cell node embedding vectors.Concurrently, the graph is perturbed to generate negative embeddings.By learning the discrimination task, the neural network model is trained to encode cell embeddings derived from spatial gene expression data, subsequently used for cell clustering.

IMPLEMENTATION Mathematical model
and the total number of cells be equal to n.
For each cell c i , i = 1, …, n, let c x i and c y i represent its x and y coordinates, and let vector ] represent embedding values (M is the total number of embedding values).Furthermore, let the distance function D : C × C → R + be defined as a measure of the similarity between the cells.In our model, for two cells c i and c j , the distance D was calculated as follows: D =  Dgene + (1 − )D coord , where  is the input parameter, Dgene is the cosine similarity between cell embeddings, and D coord is the Euclidian distance between cell coordinates: In our model, we chose K different cells from the set of cells C to represent clusters and called these cells centroids.Therefore, let the binary variables x ij (i, j = 1, …, n) and y i be defined in the following way: if cell c i belongs to the cluster represented by centroid c j 0, otherwise The Integer Linear Programming formulation of the clustering problem can be described as follows: subject to these constraints: The objective function (1) represents the sum of distances from each cell to its most similar cluster representative.This function should be minimized.Equation (2) indicates that each cell is assigned to only one cluster.Before assigning a cell to a cluster, the cluster needs to be defined (3).The total number of clusters is equal to K (4).All variables are constrained to be binary (5).
The model described with Equations ( 1)-( 5) is based on the p-median classification and is presented in a similar form by Davidović et al. [19].

Variable neighborhood search method
The VNS method is a well-known metaheuristic method.It starts from one point in the search space, explores its neighborhoods, and repeats the process until a better solution or stopping criteria are reached.This method was proposed for the first time by Mladenović [20] and later elaborated by Mladenović and Hansen [21] and Hansen and Mladenović [22].
Before we introduce the VNS method, let us define the set N k (X ), k = k {min} , . . ., kmax as the set of all vectors X′ that have a difference of the kth order from the solution X, and call that set kth Neighborhood to the solution X.
The VNS-based heuristic can be defined in a way that it starts from the initial feasible solution X, shakes it by creating another solution X′ ∈ N k (X), and then applies a local search method to create a better feasible solution X ′′ .If the feasible solution X ′′ obtained by the local search procedure is not better than the current incumbent X (F (X ′′ ) ≥ F * ), the VNS algorithm repeats the procedure of shaking in the neighborhood N k+k step (i.e., k is incremented by k step ) and local searches within it.It repeats this passage until k reaches its maximum kmax.Otherwise, if F (X ′′ ) < F * , F * becomes F (X ′′ ) and k becomes k min .The procedure of changing the neighborhood enables the VNS algorithm to get out from the local minima.The process is repeated until a certain number of iterations or other stop criteria are reached.
Pseudo-code for the basic VNS algorithm is presented as Algorithm 1. Implementations of the functions InitialSolution(), Shake(), LocalSearch(), and StoppingCondition() defined for our clustering problem are described in the following subsection.

VNS for the cell clustering problem
With respect to the problem's definition, let us assume that all cells can be represented by numbers from 1 to n.Specifically, cells can be represented by the set C = [c i ], n = |C|, and that for each cell c i there are two types of data: the x and y coordinates of the cell (c x i and c y i ) and the embedding values (vector emb i ).In our representation, the solution vector Y = [y 1 , …, y K ] contains indexes of K cells chosen as cluster representatives.Also, cell y i is a centroid of the i-th cluster.From the centroid solution vector Y we obtain vector X = [x i ] of size n in the following way: x i , i = 1, …, n, represents the closest centroid from the Y vector to the i-th cell.Our representation satisfies all conditions described by Equations ( 2)- (5).Using this representation, our goal was to minimize the value of the function . The function InitialSolution() randomly chooses K mutually different numbers from the set of numbers {1, . . ., n} and returns them as a K-dimensional vector Y.For every solution vector Y, vector X is obtained in the following way: for each cell i, the distance D between the cell i and all centroids y j from the vector Y is calculated; next, x i is set equal to the y j for which the distance D is minimal.That is, whenever the vector Y is changed, vector X is also updated.Also, to avoid repeated calculations, the distance D between all cells is calculated and saved as a distance matrix.
The Shake() function takes two inputs: the incumbent Y and the size k of the neighborhood that needs to be explored.As a result, the Shake() function randomly chooses k elements from the vector Y and replaces them with k randomly chosen elements from the set {1, . . ., n} that are different from all elements from the current Y.This means that when some elements are changed, all elements in vector Y will still be mutually different.In other words, the Shake() function chooses a vector Y′ from N k (Y).
The LocalSearch () function takes vector Y′, the distance matrix distance, and the parameters m and p as inputs.In our implementation, we used the first improvement strategy.Based on the value of the parameter m, for each element of the vector Y′, the LocalSearch() function first chooses a random integer number ind ∈ [0, m]; next, based on the ind value, keeps the observed element of the vector Y′ as it is (ind == 0) or replace it with the new one (ind > 0).For ind ≥ 2, the observed element is replaced with one of the candidates from the set of candidates that are created within the LocalSearch() function (the LocalSearch() function searches for ind candidates for which the distance value from the observed candidate is the smallest, sorts the list, excludes all candidates that are already present in the vector Y′, and then chooses one candidate for the replacement).Please note that the smallest distance value between the observed candidate and itself will be zero, so the condition ind > 1 is necessary.In case ind == 1, ind will be chosen again until its value is not equal to 1. Additionally, if the candidate list is empty after excluding all elements that already exist in the vector Y′, a random candidate will be chosen from the set {1, . . ., n} \ {y 1 , . . ., y K }.
Finally, after the procedure of replacing or keeping elements from the vector Y′ is finished, i.e., a new vector Y′′ is obtained, the LocalSearch () function calculates F (Y′′) and, if F (Y′′) < F * , the first improvement has been made, and the function returns the vector Y′′ as the output or repeats the whole process.The process of examining elements of the vector Y′ and replacing them with new values is repeated only if no improvement is made, but not more than p times.In case no improvement is made and the process has been repeated p times, the vector Y ′′ = Y′ will be returned as the output of this function.

Evaluation method
We used the Adjusted Rand Index (ARI) [28] to evaluate the results and compare them with each other.ARI is a measure used to evaluate the performance and similarity between two clustering algorithms.It quantifies the agreement between the true and predicted clustering, adjusting for the amount of agreement that could occur by chance.ARI values range from −1 to 1: where 1 indicates the perfect agreement, 0 indicates agreement expected by chance, and negative values suggest less agreement than expected by chance.

Results of the VNS method across various scenarios with single-slice datasets
Due to the sparsity of the gene expression matrix and to ensure a fair comparison, embeddings were obtained using various methods from the literature (PCA, STAGATE, GraphST, and CCST) for both Stereo-seq datasets.Moreover, all methods create embedding that significantly reduces the number of genes to a much smaller set of features.For instance, the CCST method reduced the number of genes from the Forebrain dataset to 128 features, STAGATE to 64 features, PCA to 50 features, and GraphST to 20 features.For the E2 dataset, all parameters were the same except for STAGATE, where the number of features was lowered to 30.Hence, the input data depend on the number of cells and the number of obtained features (embeddings).The standard clustering methods from the literature (k-Means, MClust, Louvain, and Leiden) and the proposed VNS method for cell clustering were applied to the generated embeddings.The results of the testing are presented in Tables 1 and 2.
The goal of the VNS method was to find the solution with the smallest cost function, and we show these results in Table 1.Table 1 shows results obtained by the VNS method only and is organized as follows: the first column presents the name of the embeddings used as the input to the VNS method, while the following four columns (f VNS , t VNS , err, and ) show For each embedding method, the results obtained by VNS are presented in separate rows.
The results presented in Table 2 are organized into three groups.Similar to Table 1, the first column (first group) presents the name of the method used for creating the embedding.
The next ten rows present the results for each clustering method separately; for each method, we provide the ARI score (ARI) and the running time (t) in seconds.The ARI and t values under the VNS columns stand for the best found ARI score obtained for all testing combinations and the corresponding running time.The highest ARI score achieved for some datasets among all clustering methods is highlighted in bold, while the second-best ARI score is highlighted by an asterisk (*).
In both tables, the first set of results corresponds to the E2 dataset, and the next corresponds to the Forebrain dataset.The E2 dataset results are visualized in Figure 1, while the Forebrain dataset results are visualized in Figure 2.

VNS clustering achieves better results than other tested methods using the E2 dataset
From the first part of the results shown in Table 1, we can conclude that, using PCA embedding in all 20 runs, the values of the cost function are very close to the lowest cost function value (err < 3.5,  < 1.5%).Using STAGATE, we have some differences, although  is still below 5% implying that the VNS method is stable with both embeddings.The results of VNS clustering when the smallest cost function values are reached are visualized in Figure 1a, while the results with the best ARI score achieved by all clustering methods are shown in Figure 1b.

VNS methods outperform other methods when clustering cells from the Forebrain dataset
By examining values from the err and  columns in Table 1 for the Forebrain dataset, it can be easily seen that differences between the results obtained in 20 runs are very small.In fact, the difference between the best-found solution (the solution with the minimal cost function value) and the other 19 solutions is less than 5% (the average relative error  is   By analyzing the results in Tables 1 and 2, we conclude that the VNS method achieves the best ARI score with the STAGATE embedding, and that in all 20 runs all solutions were close to the one with the lowest cost function (err < 1%).The results obtained with the minimal cost function and the maximal ARI score are visualized in Figure 2.

VNS demonstrates a superior performance on multi-slice datasets
Next, we compared the clustering accuracy of the VNS method with other clustering methods by using embeddings obtained by the STAligner method only.Compared to other embedding methods used for single-slice datasets, it is worth mentioning that STAligner reduces the number of genes to 30 features.The results of this comparison are presented in Tables 3 and 4. Table 3 is organized similarly to Table 1.The only difference is in the first column, which, in this case, is called Slice name.Since DLPFC datasets are 4-layered slices, this column contains the names of the first and the last slices in this particular dataset.The same case applies to other slices.Thus, each row represents the results for one separate DLPFC dataset.
Table 4 is organized similarly to Table 2; however, the column Embeddings is replaced by the column Slice name, and the names of the first and the last slices from particular multi-slice datasets are presented.And again the same case applies to other slices.The results for each dataset are presented in separate rows, as in Table 3.The results from Table 3 are visualized in Figure 3.
As we see from the columns err and  in Table 3, in all 20 runs, the VNS method obtained results similar to the ones with the smallest cost function (err < 5.8%,  < 2.5%).Again, these results imply that the method is stable even for multi-slice datasets.The fact that results from the columns t VNS are smaller than 5 implies that this method can obtain results for four slices of these types of datasets in less than 5 s.

Gigabyte
From the results presented in Table 4, it can be concluded that the method proposed in this paper outperforms other clustering methods in all aspects.Specifically, for each of the datasets we tested, ARI score was the highest and the running time was the lowest when the VNS method was used.

DISCUSSION AND CONCLUSION
Here, we introduced a novel approach suitable for clustering both single-and multi-slice spatial transcriptomics datasets.This is the first application of a metaheuristic method, called the VNS, to the clustering of spatial transcriptomic data.The essence of the VNS implementation presented in this study is the utilization of a combinatorial/mathematical optimization algorithm; in this instance, a metaheuristic approach.These methods are strategically designed to deliver sufficiently optimal solutions to optimization and machine learning challenges while minimizing computational resources.This approach is intended to offer a robust and computationally efficient solution for cell clustering in spatial transcriptomics.
Our analysis demonstrated that the performance of clustering methods is significantly influenced by the choice of embeddings and the way they were generated.Notably, the VNS approach combined with PCA embeddings yields results that closely align with the ground truth, as illustrated in Figure 2b.When benchmarked against existing techniques, our method consistently outperforms in terms of efficiency and ARI scores.The algorithm's speed and stability are commendable, and its flexibility is evidenced by a comprehensive set of parameters that can be tailored to meet diverse user requirements.Future research will extend the method's application to time-series datasets and explore additional VNS modifications and embedding techniques to enhance its utility.
is a specialized tool for aligning and integrating spatially-resolved transcriptomics data.It begins by normalizing expression profiles for all spots and creating a spatial neighbor network based on spatial coordinates.Employing a graph attention auto-encoder neural network, STAligner extracts spatially-aware embeddings and uses spot triplets to guide the alignment process, fostering similarity among related spots and distinction among dissimilar ones across slices.The introduction of triplet loss refines spot embeddings by minimizing the distance from the anchor to positive spots and increasing the distance to negative spots.This iterative process optimizes triplet construction and auto-encoder training until batch-corrected embeddings are obtained.Furthermore, STAligner's versatility extends to integrating spatial transcriptomics datasets, facilitating alignment and concurrent identification of spatial domains across diverse biological samples, technological platforms, developmental stages, disease conditions, and consecutive tissue slices for 3D alignment.
the smallest cost function value, the corresponding running time, and the statistical analysis of all solutions obtained by VNS when comparing to the presented cost function value in that order.In other words, due to the stochastic nature of the metaheuristic, the VNS algorithm was run 20 times (for 20 different seeds) for each embedding, and information regarding the best solution value obtained in these 20 runs is provided in these four columns (f VNS , t VNS , err, and ).More precisely, f VNS presents the minimal cost function value obtained after these 20 runs; t VNS is the corresponding running time for the presented solution value; err and  contain additional information on the quality of the Gigabyte,

Figure 1 .
Figure 1.(a) Results of the VNS clustering on the E2 dataset.The first figure on the left presents the ground truth data.These results were obtained using the VNS method with PCA, STAGATE, GraphST, and CCST embeddings.(b) Clustering results for the E2 dataset.Each row presents the clustering results obtained by k-Means, MClust, Louvain, Leiden, and VNS over a certain embedding method.Therefore, the first row presents the results obtained by all clustering methods when using PCA embedding.The next three rows used STAGATE, GraphST, and CCST embeddings.

Figure 2 .
Figure 2. (a) Results of the VNS clustering on the Forebrain dataset.The first figure on the left presents the ground truth data.These results were obtained using the VNS method with PCA, STAGATE, GraphST, and CCST embeddings.(b) Clustering results for the Forebrain dataset.Each row presents the clustering results obtained by k-Means, MClust, Louvain, Leiden, and VNS, over a certain embedding method.Therefore, the first row presents the results obtained by all clustering methods when using PCA embedding.The next three rows used STAGATE, GraphST, and CCST embeddings.

Figure 3 .
Figure 3.The clustering results on the DLPFC datasets 151507-151510, 151669-151672, and 151673-151676 are presented in panels (a), (b), and (c), respectively.The first column shows the ground truth data, while the subsequent columns display the results obtained using k-Means, MClust, Louvain, Leiden, and the VNS method with STAligner embeddings.

Table 1 .
VNS solution for single-slice datasets.Values in columns f VNS , t VNS , err and  are obtained as explained in the Analysis section.

Table 2 .
Clustering method comparison for single-slice datasets.The highest ARI score achieved for some datasets among all clustering methods is highlighted in bold, while the second-best ARI score is highlighted by an asterisk (*)., where err i = |VNS i − f VNS |∕|VNS i |, where VNS i is the VNS solution obtained in the ith run (seed).The value  is the standard deviation of err and is calculated by σ = √

Table 4 .
Clustering method comparison for multi-slice datasets.The highest ARI score achieved for some datasets among all clustering methods is highlighted in bold, while the second-best ARI score is highlighted by an asterisk (*).This result means that the solutions found in all 20 runs were very close to the smallest one.Also, from the results in the column t VNS , we can observe a running was less than 1 minute for three different embedding types and less than 2 minutes for one embedding type.Moreover, from the results presented in Table2for the Forebrain dataset, we can see that, in the majority of cases, VNS had the highest ARI score compared to the other methods (for three types of embedding, the VNS ARI score was the highest).Also, the running time was less than 1 minute for each type of embedding.The only embedding for which the VNS did not find a solution with the best ARI score was the PCA one, and for this embedding, the best ARI score was obtained by the k-Means method.